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METHOD OF SURVEILLING INTERNET COMMUNICATION 
'BACKGROUND OF THE INVENTION 

i 
I 
I 

The present invention is directed to a method and system for enabling 
surveillance and monitoring of networks communications by analysis of data 
traversing therethrough. 

A huge amount of traffic is flowing through today's computer networks, 
not all of which is benign. Thus, an owner or supervisor of a given network 
may be most interested to be able to track or "listen in n in real time in otter to 
effectively monitor and/or secure the network. Such monitoring or 
surveillance can be achieved by connecting a probe to the network in order to 
monitor data traveling be*ween two of more nodes (e.g., user workstations) 
on the network. 

In a system where commentation between two nodes is in a form of 
discrete packets, the network probe can "read 11 a packet of data in order to 

i 

gather information, such as regarding the sources and the destination 
addresses of the packet, or the protocol of the packet In addition, statistical 
and related information can be computed such as the averageor total amount 
of traffic of a certain protocol type during a given period of time, or the tptal 
number of packets being sent to or from a node. This information may be 
reported to a system administrator in real-time, or may be stored for lataV 
analysis. 

Various attempts have already been made in this direction. For 
example, Clear View Network Window, a software program available fro : m 
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Clear Communications Corporation, of Lincolnshire, II!., U.S.A, allegedly 
offers predictive/proactive maintenance, intelligent root-cause analysis, and 
proof-of-quality reports. However, the output is designed for network fault 
management, which is not the same as "tapping" into a communication j 
between nodes in the network, Thus, the Clearview system does not allow 
monitoring of data transferred between two nodes in the network with regard 
to contents or characteristics. 

Livermore National Laboratory, Livermore, Cal., U:S A developed : a 
group of computer programs to protect the computers of the U.S. Department 
of Energy by "sniffing" data packets that travel across a local area network. 
The United States Navy used one of these programs, known as the ,, iWatch ,, 

program, in order to wiretap on communications of a suspected computer 

"' | 

hacker who had been breaking into computer systems at the U.S. Department 
of Defense and NASA. The iWatch program uses a network probe to read ail 
packets that travel over si network and then "stores" this information in a 
common database. A simple computer program can then be written to read 

t 
i 

through the stored data, and to display only predefined "interesting" pieces of 
information, 

■ j 

Whenever an interesting piece of information is found, the stored data is 
rescanned and a specific number of characters located at both sides of the 
"interesting" piece are reported. These interesting characters are then 



WC 01/91412 



3 



PCT/IL01/00471 



reviewed in order to determine the content of the message and used as a 
guide to future monitoring activity, 

This system is restricted to history analyze of user activities and does 
not enable complete "tapping" of ail user activities and fuii simulation of the 
users surfing activity. 

Three major problems are encountered in the way of achieving 
continuous and reliable tracking: 

(a) Individual browsers do not report all the activities performed to a 

* » 

web server. For example, when a browser loads web pages from its 

! 

browser cache space or from a proxy server, it does not send ! 
requests to any "remote" web server through the cyberspace 
autostrada; - 

(b) Application programs designed to perform certain featured by 
web browser of one manufacturer are usually not compatible with 
those manufactured by another vendor because browser interface 
mechanisms are different and proprietary to each one of them; and 

(c) Individual browsers send their requests to web servers in a 
nan-systematic order In other words, with regard to a given wdb 
server, a preceding request has no relation to a subsequent 
request In processing of requests, a web site has no control over 
the sequences of the requests. 
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, In an attempt to overcome these problems, US Patent No. 5,951 ,643 
refers to a mechanism for dependably organizing and managing information 

for web synchronization and tracking among multiple consumer browsers. 

i 

However, this solution is limited to tracking activities of identified users, 
who agreed to be "tapped" and willingly cooperated and be connected to 1 the 
host with designated application. 

it is thus the prime object of the invention to provide a monitoring and 
surveillance method and system enabling network communication suppliers 
to tap any user connected to the network. 

t 

It is a further object erf the invention to provide a tapping methodology 
enabling network communication suppliers to watch in real time all user 
activities while communicating a net work. 

* -- — 
It is a stil! further object of the invention to enable web-site owner to 

monitor and tap users contacting their web site 
SUMMARY OF THE INVENTION 

Thus provided according to the present invention is a method of ! 
'tracking a network communication line by network probe terminal ("terminal 
agent") simulating a browser ("original browser 1 ') activity of a given terminal 
comprising the steps of accessing the network communication line, tracing 
TCP/IP data packets routed through the communication line, selecting . 
TCP/IP data packets relating to a given IP address; ("identified data 
packets"), selecting from the identified data packets current requests for new 
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connections ("original requests"), selecting from the identified data packets 
current web-page components indicating new addresses ("new navigation 
components 11 ), dividing the new navigation components into two categories, 
embedded objects or frames ("false new components"), hyperlinks ("true new 
components"), dividing the original requests into original requests matching 
true the new components, or original requests failing to match any new 
connection components and belonging to HTTP' or POST type as "primary 

requests", original requests matching the false components as "secondary 

> 

requests", selecting from identified data packets, HTML data files relating to 
primary requests; ("respective primary responses"), generating "virtual 
secondary requests according to the respective secondary responses, ; 
selecting from identified data packets responses relating to secondary virtual 
requests, ("respective secondaj^/esponses") and simulating web page 
presentation on the terminal agent according to the respective secondary 
responses. , 

BRIEF DESCRIPTION OF THE DRAWINGS 

These and further features and advantages of the invention will 

> 

become more clearly understood in the light of the ensuing description of a - 
few preferred embodiments thereof, given by way of example only, with 
reference to the accompanying drawings, \Mierein- 

Fig. 1 illustrates a typical network configuration, in which the present 
invention can be implemented; 
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/ Fig. 2 illustrates the terminal agent scheme of operation; 

Fig. 3 illustrates the process of tracing and identifying TCP/IP' data 
packets; 

Fig, 4 is a flowchart of classifying TCP/IP requests; 

Fig. 5 is a flowchart of simulating the creation of virtual secondary 
TCP/IP requests; and 

Fig. 6 illustrates the process of simulating original browser activities. 

i 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 

Referring to Fig. 1, let us assume that terminals 01, 02... are connected 
to the same communication line, where the communication line is used as 
internal network ("Intranet"), or external network such as the Internet. 

'5/ 

According to tine present invention it is proposed to connect a designated 
network probe {hereinafter called "the Terminal Agent") to the data 
communication line. Alternatively, the terminals 01,02 etc., and the terminal 
agent may be connected to different data communication lines, or located at 
different local networks. 

The general scheme of the terminal Agent operation is illustrated in Fig. 

2. 

The Terminal Agent is exposed to all" date frames passing through the 
communication line. The data frames may contain information transferred 
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between the terminals or external data transmission to external sources such 

i 

as Internet servers. 

Let us further assume that the "Owner 11 of the data communication line, 
such as ISP or network of a private organization, is interested in monitoring in 
real time, the actual communication activities of a given terminal when surfing 
the internet 

The operation of the Terminal Agent is to first analyze the data frames 
for tracing TCP/IP data packets. As illustrated in Fig 3, the data analysis is 
processed according to the different protocol hierarchy (see RFC 0793 of the 
internet protocol), namely, first to analyze the local network protocol, filtering 
external data transmission ("gateway level "), then identifying internet . 
protocol (IP) data frames, and finally detecting TCP('Transition Control 
Protocol") data packets of the "host level".. 

Upon analyzing the IP HEADER of the data packets, the IP addresses of 
the requesting terminal and of the message destination are Identified. The 
owner of the communication line can easily relate the IP address to the users 
terminals. Therefore it is possible to filter out all other irrelevant data packets 

and restrict further processing to data transmission of one selected terminal 

i 

(hereinafter called ''the identified data packets"). 

i 

The identified data packets are further analysed according to the r!fC 
079 specification enablinjj full management and control of data 
communication ports. 
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. According to known routines of managing TCP data communication! 
ports, as processed by conventional browsers, e.g. the Internet Explorer,! the 
terminal which operates the browser is the original source of all data 
5 transmission. For example let us assume that the terminal placed a request 
for YAHOO! home page, which request is delivered through the network to 
YAHOO I server. In response, 'the server sends an. HTML data file containing 
all information of yahoo home web page components. Accordingly the 
browser sends new requests for receiving all components of the web-page by 
10' opening new communication "virtual" ports, where each port is used for 
transmitting different components of the same web-page. An "outsider 
terminal, exposed to all data requests and respective responses is unablje to 

i 

differentiate between initial "primary" requests, e.g. requesting the complete 
YAHOO! home page and "secor&ary" requests for receiving the components 

15 thereof. For simulating the activity of the original browser by an "outsider" 
probe terminal it is essential to identify the primary requests as such. 

Fig. 4 illustrates the process for differentiating the primary requests 
from the secondary requests. Primary requests are originated from different 
operations such as entering a new URL by the user, choosing a hyperlink, 

20 etc. Therefore, in order to detect same one must analyze the previous 
Information transmitted to the same IP address. AH new navigation 
components (addressing the browser to new location) of the web page 

• i 

received by the terminal are sorted according to their type, all embedded 
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objects, frames, etc., are marKed as "false" components, while hyperlinks are 
marked as "true" components. All data is stored in the Incoming. buffer 
responses database for later use. 

5 When identifying a request for a new connection according to TCP 

analysis, the request is examined according to the respective navigation 

k ■ 

l components (RNC) in the incoming respond buffer. If the RNC is marked as 
\ ' 'false" the request is ignored; if the RNC is marked as "true" the request is 
I classified as primary; otherwise, if there is no RNC relating the said request, 
10 the connection type should be identified, if the connection is of an HTML 
type, or "post* type, it is classified as a primary request 

In order to view and monitor the activities of a terminal, all "original* 
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browser activities must be reconstructed. For that purpose it is suggested to 
use a "virtual browser. This virtual browser possesses all the capabilities of a 
15 "rear browserto download in real time web pages from the Internet. However 
its connection with the Interne* is virtual in the sense that no actual data 
exchange with the internet servers is preformed, but only simulating the 
activities of the original "real" browser. 

The first function of the virtual browser is illustrated in Fig* 5. The 

20 browser is receiving all primary requests of the "real" browser. These primary 

I 

requests and the respective primary responses from the Internet are analyzed 
and processed according to the conventional browser operation. However the 
outcome of secondary virtual requests (in conventional browser used to 
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complete the process of downloading web page components) are not 
transferred directly as usual through the Internet to the appropriate, server but 
stored in a the virtual "secondary" requests bufferdatabase 

Although the virtual browser connection is not "rear, all TCP protopol 
management of opening sind controlling ports connection is processed ky the 
terminal agent as if the connections are "rear ones. 

The final process of simulating and presenting the web pages in the 
virtual browser is further illustrated in Fig. 6. All briginal secondary responses 
coming through the communication line are analyzed and recorded in the 
incoming responses buffer database. The virtual requests are compared to 
the respective secondary responses stored in the incoming responses buffer 

database, by the order of their arrival. If the respective secondary responses 

^/ - — 

already exists in the buffer, these responses are transferred to the virtual 
browser, and processed (according to conventional browser operation) to 
present the visual picture of the respective web page components. As a 
result, the terminal agent is simulating in real time the exact process of 
downloading Internet web pages as it has been performed by the original 
terminal. 

• • 

in case the respective responses do not appear in the incoming 
responses buffer database, activity of an original local cache is deduced. If 
the original local cache vivas not used with respect to said virtual request, it is 
suspend in the buffer database until the original secondary respective 
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responses arrive. Otherwise, if the real local cache was used relating to this 
respond, the local cache of the virtual browser is examined, and if respective 
secondary responses exist in the local cache, then the respective respond is 

5 transferred to the virtual browser and processed as described above. In case 
the respective responses do not exist in the virtual cache, either of the 
following alternatives may be applied. According to one, "passive" version of 
the terminal agents, no further action is taken to find the "missing" respond, 
and an "error" message will appear at the agent terminal instead of the web 

10 page component which appeared in the real terminal. According to this 
version, the simulation of the real terminal is not complete but the tapping 
activity is undetectable. According to another, "active 9 ' version, the terminal 
agent addresses the web page server to request the "missing" respond. 
• Although this version enables tf^ierminal agent to present more exact 

15 picture of the real terminal activities, it is traceable for more experienced 
terminal users, who are aisle to detect the tapping activity. 

According to a further mode of implementation of the of the preserit 
invention, It is proposed to tap not only to related web page data packets, but 
to trace also related messages data packets e.g. e-mail or chats. To enables 
20 such tapping, the same method and principals as described above are 

applied at request for receiving and sending messages through the network 
other than requests for wab pages. The process of analyzing such requests 
and the respective responses is more streamlined since there is no need to 
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cfreckthe cache memory activity, as by definition such information is aty/ays 
new. 

Finally, it should be appreciated that the above-described embodiments 
are directed to Internet communication environment. However, the invention 
in its broad aspect is equally applicable to computerized network 
communication in general, such as satellite, cellular and others. 

While the above description contains many specificities, these should 
not be construed as limitations on the scope of the invention, but rather as 
exemplification of the preferred embodiments. Those skilled in the art v\ji!l 
envision other possible variations that are within its scope. Accordingly, *he 
scope of the invention should be determined not by the embodiments 
Illustrated, but by the appended claims and their legal equivalents. 



